AITopics | open-vocabulary model

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)
(3 more...)

Neural Information Processing SystemsNov-15-2025, 23:04:25 GMT

A Dataset details In Table 3, we present the number of classes and the size of the training, validation and test sets we use for the each patching and supported tasks: Stanford Cars [ 35

This practice, introduced by Radford et al.

accuracy, artificial intelligence, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsNov-15-2025, 23:04:19 GMT

Patching open-vocabulary models by interpolating weights Gabriel Ilharco 1 Mitchell Wortsman

However, there are still settings where their zero-shot performance is far from optimal.

accuracy, machine learning, natural language, (14 more...)

Country:

South America > Brazil (0.14)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Poland (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsOct-9-2025, 07:58:01 GMT

Neural Priming for Sample-Efficient Adaptation Matthew Wallingford Vivek Ramanujan Alex Fang Aditya Kusupati

large language model, machine learning, neural priming, (18 more...)

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
(2 more...)

Neural Information Processing SystemsAug-18-2025, 08:31:17 GMT

bc6cddcd5d325e1c0f826066c1ad0215-Supplemental-Conference.pdf

This practice, introduced by Radford et al.

accuracy, artificial intelligence, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsAug-18-2025, 08:31:12 GMT

Patching open-vocabulary models by interpolating weights Gabriel Ilharco 1 Mitchell Wortsman

However, there are still settings where their zero-shot performance is far from optimal.

accuracy, machine learning, natural language, (14 more...)

Country:

South America > Brazil (0.14)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Poland (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Abdalwhab, Abdalwhab, Imran, Ali, Heydarian, Sina, Iordanova, Ivanka, St-Onge, David

Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites

arXiv.org Artificial IntelligenceJan-15-2025

The construction industry has long explored robotics and computer vision, yet their deployment on construction sites remains very limited. These technologies have the potential to revolutionize traditional workflows by enhancing accuracy, efficiency, and safety in construction management. Ground robots equipped with advanced vision systems could automate tasks such as monitoring mechanical, electrical, and plumbing (MEP) systems. The present research evaluates the applicability of open-vocabulary vision-language models compared to fine-tuned, lightweight, closed-set object detectors for detecting MEP components using a mobile ground robotic platform. A dataset collected with cameras mounted on a ground robot was manually annotated and analyzed to compare model performance. The results demonstrate that, despite the versatility of vision-language models, fine-tuned lightweight models still largely outperform them in specialized environments and for domain-specific tasks.

artificial intelligence, dataset, open-vocabulary model, (13 more...)

2501.09267

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Quebec > Montreal (0.05)
Europe > Portugal > Braga > Braga (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Construction & Engineering (0.71)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

arXiv.org Artificial IntelligenceDec-4-2023

Neural Priming for Sample-Efficient Adaptation

Wallingford, Matthew, Ramanujan, Vivek, Fang, Alex, Kusupati, Aditya, Mottaghi, Roozbeh, Kembhavi, Aniruddha, Schmidt, Ludwig, Farhadi, Ali

We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time, even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2.45% improvement in accuracy on ImageNet and 3.81% accuracy improvement on average across standard transfer learning benchmarks. Further, using Neural Priming at inference to adapt to distribution shift, we see a 1.41% accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the challenge of limited labeled data and changing distributions. Code is available at github.com/RAIVNLab/neural-priming.

large language model, machine learning, neural priming, (19 more...)

2306.10191

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.55)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Gadre, Samir Yitzhak, Wortsman, Mitchell, Ilharco, Gabriel, Schmidt, Ludwig, Song, Shuran

CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

arXiv.org Artificial IntelligenceDec-14-2022

For robots to be generally useful, they must be able to find arbitrary objects described by people (i.e., be language-driven) even without expensive navigation training on in-domain data (i.e., perform zero-shot inference). We explore these capabilities in a unified setting: language-driven zero-shot object navigation (L-ZSON). Inspired by the recent success of open-vocabulary models for image classification, we investigate a straightforward framework, CLIP on Wheels (CoW), to adapt open-vocabulary models to this task without fine-tuning. To better evaluate L-ZSON, we introduce the Pasture benchmark, which considers finding uncommon objects, objects described by spatial and appearance attributes, and hidden objects described relative to visible objects. We conduct an in-depth empirical study by directly deploying 21 CoW baselines across Habitat, RoboTHOR, and Pasture. In total, we evaluate over 90k navigation episodes and find that (1) CoW baselines often struggle to leverage language descriptions, but are proficient at finding uncommon objects. (2) A simple CoW, with CLIP-based object localization and classical exploration -- and no additional training -- matches the navigation efficiency of a state-of-the-art ZSON method trained for 500M steps on Habitat MP3D data. This same CoW provides a 15.6 percentage point improvement in success over a state-of-the-art RoboTHOR ZSON model.

large language model, natural language, navigation, (19 more...)

2203.10421

Country: North America (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Ilharco, Gabriel, Wortsman, Mitchell, Gadre, Samir Yitzhak, Song, Shuran, Hajishirzi, Hannaneh, Kornblith, Simon, Farhadi, Ali, Schmidt, Ludwig

Patching open-vocabulary models by interpolating weights

arXiv.org Artificial IntelligenceOct-11-2022

Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.

accuracy, large language model, machine learning, (19 more...)